DPLASMA Warmup -- 2nd try #69

therault · 2023-03-17T21:26:00Z

This is a second try for solving the warmup issue in DPLASMA (especially in CUDA codes).

Here are some performance measurements of the approach proposed in this PR, on Leconte (8x V100):

'gflops/avg' represents the ratio 'gflops of this run' divided by the 'appropriate average' (meaning the average without the outlier on runs without warmup, and the average of all measured points on runs with warmup).

There is still some warmup problem that is unidentified, at large tile size (512, 1024), for 1 to 4 GPUs, the first actual run is still slower than the others for the small problem sizes. It's unclear what is the source of the issue at this point, but the warmup patch fixes most of the CUDA/CUBLAS warmup issues.

The goal of the current code is to include changes for all tests that feature a CUDA implementation and timing:

POTRF
GEMM (WIP)
POINV
GEQRF

TRSM is the last kernel that features a CUDA implementation, and it does not have timing in its testings.

Aurelien has notified during the discusssion of an issue with HIP: allocation of memory on the HIP device was lazy at some point, and allocation at first touch is a significant part of the warmup overheads of the HIP runs. We decided that this should be solved at the PaRSEC level, during memory allocation, and not at the DPLASMA warmup level.

therault · 2023-03-24T14:16:34Z

Done: all PTG tests that are CUDA-enabled

To do:

DTD tests that are CUDA-enabled
Update performance of CUDA-enabled tests on recent machines

bosilca · 2023-03-31T14:53:23Z

As discussed on 03/31/23 we need to

cover the CPU only tests (in addition to the device tests)
need to touch the entire data at least once on the GPU (at the level of PaRSEC data during the allocation stage).

therault · 2023-04-06T17:56:32Z

check what is happening in paranoid debug mode: tests on frontier show that the local data dists generated via the warmup calls trigger (wrongful?) asserts
GPU load statistics should be reset after the completion of each warmup test. Shouldn't that happen automatically via the completion of tasks? parsec#01592dc6 adds a function to do that (explicit call)

… in testers, to provide a simple way to get consistent performance results. Implementation of warmup_zpotrf in testing_zpotrf.c testing_zpotrf: use zplghe and not zplrnt to initialize symetric positive definite matrices; call the warmup function as it has been validated experimentally Port warmup to testing_zpoinv Port warmup to QR (PTG). Looks like CUDA-QR is having some issues. Support warmup for GEMM -- Only assign a preferred device in zgemm_NN_gpu.jdf if the upper level has not assigned one, allowing the user to control finely where tasks will execute if they want (and the warmup process definitely wants to control that) Update to current parsec, and enable warmup in testing_zgebrd_ge2g Port warmup to testing_zgelqf_hqr Port testing_zgelqf_systolic Fix some bugs in testing_zpotrf.c's warmup Add warmup for zgetrf* zpoinv, and zpotrf_dtd* Use the same zgeqrf warmup for dtd tests Use the same warmup for testing_zgemm and testing_zgemm_dtd Porting warmup on zgelqf Add loop and warmup to testing zheev Add warmup and performance measurement loop to GEQRF HQR and Systolic Inplement new warmup strategy when no-known GPU implementation exists - if there is a known GPU implementation, just assume we need to warmup once per device - if there is no known GPU implementation, iterate over the task classes, and check if a GPU implementation exists. If it is the case, run a warmup for each device of that type. That codepath will be skipped until someone implements a GPU version for all operations... Worst case, it will not properly work, and not break the test. Best case, we will not forget to do warmup for GPU cases. Add the warmup/loop to ZGESVD Fix GEMM warmups: GEMM uses reshaping to support ScaLAPACK + TILED data representations, and the data collection wrapper does not work well with the hack of changing the rank_of function in the source data collections. Simply do a 1D distribution of A and C over all the ranks to ensure that all processes initialize GEMM in the warmup.

abouteiller · 2023-06-22T18:07:46Z

GPU load statistics should be reset after the completion of each warmup test. Shouldn't that happen automatically via the completion of tasks? parsec#01592dc6 adds a function to do that (explicit call)

This is done in #89

Signed-off-by: Aurelien Bouteiller <[email protected]>

src/zgeqrf.jdf

Signed-off-by: Aurelien Bouteiller <[email protected]>

therault self-assigned this Mar 17, 2023

therault mentioned this pull request Mar 17, 2023

PaRSEC memory advice preferred device and parsec_get_best_device failure ICLDisco/parsec#502

Open

therault force-pushed the warmup2 branch from 10e189b to 0c98d9d Compare March 27, 2023 17:45

This was referenced Mar 31, 2023

Warmup demonstration for POTRF #61

Closed

[BBT#89] Dplasma warming run #48

Closed

therault marked this pull request as ready for review June 2, 2023 14:44

therault requested a review from a team as a code owner June 2, 2023 14:44

therault force-pushed the warmup2 branch from 354e3c0 to 2dd0367 Compare June 2, 2023 15:37

therault force-pushed the warmup2 branch from 2dd0367 to 74f255e Compare June 2, 2023 16:56

abouteiller added 3 commits June 22, 2023 16:39

Merge branch 'master' into warmup2

fc2f66f

Have a flush so that we see results as they get computed

293c3ac

Signed-off-by: Aurelien Bouteiller <[email protected]>

Indentation and copyrights

c43a65b

Signed-off-by: Aurelien Bouteiller <[email protected]>

abouteiller approved these changes Jun 22, 2023

View reviewed changes

src/zgeqrf.jdf Show resolved Hide resolved

abouteiller added 2 commits June 27, 2023 11:02

Merge branch 'master' into warmup2

6e51a06

Some more missing copyrights

885cf0c

Signed-off-by: Aurelien Bouteiller <[email protected]>

abouteiller merged commit a0ea91c into ICLDisco:master Jun 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DPLASMA Warmup -- 2nd try #69

DPLASMA Warmup -- 2nd try #69

therault commented Mar 17, 2023

therault commented Mar 24, 2023

bosilca commented Mar 31, 2023 •

edited by abouteiller

Loading

therault commented Apr 6, 2023 •

edited by abouteiller

Loading

abouteiller commented Jun 22, 2023

DPLASMA Warmup -- 2nd try #69

DPLASMA Warmup -- 2nd try #69

Conversation

therault commented Mar 17, 2023

therault commented Mar 24, 2023

bosilca commented Mar 31, 2023 • edited by abouteiller Loading

therault commented Apr 6, 2023 • edited by abouteiller Loading

abouteiller commented Jun 22, 2023

bosilca commented Mar 31, 2023 •

edited by abouteiller

Loading

therault commented Apr 6, 2023 •

edited by abouteiller

Loading